Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added sitemap.xml gen #20

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

forrest321
Copy link

Added sitemap.xml gen

Added sitemap.xml gen
@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

@forrest321
Copy link
Author

I signed it!

@forrest321 forrest321 closed this Nov 12, 2018
@forrest321 forrest321 reopened this Nov 12, 2018
@forrest321 forrest321 closed this Nov 12, 2018
@forrest321
Copy link
Author

I signed it!

@forrest321 forrest321 reopened this Nov 13, 2018
@googlebot
Copy link

CLAs look good, thanks!

crawlsite.js Outdated
var p = "";
crawledPages.forEach(element => {
var n = "\t\t<url>\n";
n = n + "\t\t\t<loc>\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a node lib we could use instead of manually crafting the xml?

crawlsite.js Outdated
@@ -161,6 +162,28 @@ async function crawl(browser, page, depth = 0) {
}
}

function buildSitemap() {
if (SITEMAP && crawledPages) {
var p = "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use let throughout

@abdonrd
Copy link

abdonrd commented Jan 7, 2019

Interested in this!

@forrest321
Copy link
Author

forrest321 commented Sep 21, 2019

Requested changed are done.

@forrest321
Copy link
Author

This one fell off my radar, thought I'd get it done. Hope that helps.

@Kiina
Copy link

Kiina commented Oct 5, 2020

Just a note, the sitemap generator might need a filter for anchor same page links (aka example.com/#home). I'm kinda sure they shouldn't be included in a sitemap in basically any case but they get included in the current implementation.

My quick and dirty fix would be to just add page.url = page.url.replace(/#.*$/,''); before the if (crawledPages.has(page.url)) { so it removed the anchors but i'm not sure how that interferes with the original crawl function when people wanna build the graph

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants